MIM 2 Data Spaces Graduation Project

by Maxwell Ernst - 18/06/2024

Fontys Logo Eindhoven Logo

Inzicht Verlicht Data: ETL for MIM2 Data Model Creation¶

This Jupyter notebook documents the ETL (Extract, Transform, Load) process for preparing Inzicht Verlicht data to be used in building MIM2 data models for sharing using the Smart Data models by FIWARE. The steps outlined here will be used as a running example in creating a MIM2 creation process document, ensuring an efficient approach to data preparation for future MIM2 development.

This presentation explains Smart Data Models in detail: https://docs.google.com/presentation/d/1k021ZzRyk2PtikubOFNa3KyXeXL5nTmWWxyhs0Y0r1k/edit#slide=id.g10849e1f66c_0_499

What is ETL?¶

ETL stands for Extract, Transform, Load. It's a crucial process in data management that involves:

  • Extracting data from its source (in this case, the Inzicht Verlicht data).

  • Transforming the data to meet the requirements of the target system (MIM2 data models). This involves cleaning, formatting, and manipulating the data.

  • Loading the transformed data into a CSV (here, the data will be used to build MIM2 data models).

Benefits of a Documented ETL Process¶

By documenting the ETL process in this notebook, we gain several advantages:

  • Reproducibility: Anyone can follow these steps and replicate the data preparation for future MIM2 projects.

  • Standardization: This ensures consistency in how the Inzicht Verlicht data is prepared for MIM2 usage.

  • Efficiency: Having a documented process saves time and effort by avoiding the need to reinvent the wheel for each project.

  • Clarity: The documentation provides a clear understanding of how the data is transformed, facilitating troubleshooting and potential improvements.

The following sections of this notebook will delve into each stage of the ETL process for Inzicht Verlicht data used in MIM2 data model creation. The image below shows the different MIMs where this project is focused on MIM2 data models and sharing.

Table Of Contents¶

  1. Libraries
  2. Importing Data (Extract)
    • 2.1. Vinotion Data (Car CCTV data)
    • 2.2. Sorama (Sound Sensor Data)
  3. Data Conversions (Transform)
    • 3.1. Vinotion Conversion
      • 3.1.1. Aggregating Vinotion data
      • 3.1.2. Adding columns
    • 3.2. Sorama Conversion
  4. Generating Scripts (Load)
    • 4.1. Vinotion Python Script
    • 4.2. Sorama Python Script
  5. JSON Schema, JSON Input and Testing
    • 5.1. Vinotion JSON
    • 5.2. Sorama JSON
  6. Testing JSON Schema and inputs
    • 6.1. Vinotion JSON Test
    • 6.2. Sorama JSON Test
  7. Conclusion
  8. Recommendations

1) Libraries¶

Loading in the python libraries for the ETL process.

In [ ]:
import pandas as pd
import numpy as np
from IPython.display import Image
from datetime import datetime
import plotly.offline as pyo

pyo.init_notebook_mode()

2) Importing Data (Extract)¶

2.1) Vinotion Data (Car CCTV data)¶

In [ ]:
df_vinotion_full = pd.read_csv (r'/Users/maxwellernst/Documents/data/df_vinotion_with_seconds.csv', sep=',')
df_vinotion_full
Out[ ]:
datetime_short cameraId classification ruleId speed uuid
0 2021-11-02:09:51:39 wss://10.252.229.12/api/v2/data/main/object Car Application/Channel/Count/0/Processing/CountRu... 54.49428 be698d9d-35dd-4588-8f1a-149c116c3973
1 2021-11-02:09:51:40 wss://10.252.229.12/api/v2/data/main/object Car Application/Channel/Count/0/Processing/CountRu... 50.97996 91a1416c-c3f3-4498-9dae-8d32c0ae7457
2 2021-11-02:09:51:41 wss://10.252.229.12/api/v2/data/main/object Car Application/Channel/Count/0/Processing/CountRu... 55.75644 ac92fd4b-807f-4ce9-ac2e-7b9ab0b8fb78
3 2021-11-02:09:51:42 wss://10.252.229.12/api/v2/data/main/object Car Application/Channel/Count/0/Processing/CountRu... 51.20208 bf1a1528-1fe7-4558-b1e0-9f59fc5d4a88
4 2021-11-02:09:51:43 wss://10.252.229.12/api/v2/data/main/object Car Application/Channel/Count/0/Processing/CountRu... 60.90840 c8be14e9-c24a-4542-bbc5-5e389d0a6034
... ... ... ... ... ... ...
2370144 2021-11-30:22:59:51 wss://10.252.229.14/api/v2/data/main/object Car Application/Channel/Count/0/Processing/CountRu... 38.90988 c24f5be9-2474-4b80-b54c-766398f462ec
2370145 2021-11-30:22:59:51 wss://10.252.229.13/api/v2/data/main/object Car Application/Channel/Count/0/Processing/CountRu... 44.01072 2ced7471-922d-4e35-922c-e854acf891c7
2370146 2021-11-30:22:59:51 wss://10.252.229.13/api/v2/data/main/object Car Application/Channel/Count/0/Processing/CountRu... 40.30704 7bdb8752-fd32-4cb2-b394-e7e900b95fd5
2370147 2021-11-30:22:59:56 wss://10.252.229.12/api/v2/data/main/object Car Application/Channel/Count/0/Processing/CountRu... 71.63712 6667479f-1b76-4a9b-905c-5387edcf16fd
2370148 2021-11-30:22:59:57 wss://10.252.229.12/api/v2/data/main/object Car Application/Channel/Count/0/Processing/CountRu... 60.88536 9ce75489-b1af-4d91-8810-16dddff967f1

2370149 rows × 6 columns

2.2) Sorama (Sound Sensor Data)¶

In [ ]:
df_sorama_full = pd.read_csv (r'/Users/maxwellernst/Documents/data/modified_data.csv', sep=',')
df_sorama_full['deviceId'] = df_sorama_full['deviceId'].astype(str)
df_sorama_full
Out[ ]:
deviceId Content_time Content_value Date Time Year Month Hour Datetime
0 2,03804E+18 2021-11-02 09:49:22.310 57.077148 2021-11-02 1900-01-01 09:49:22.310 2021 11 9 2021-11-02 09:49:22.310000-01:00
1 2,03804E+18 2021-11-02 09:49:23.310 57.639920 2021-11-02 1900-01-01 09:49:23.310 2021 11 9 2021-11-02 09:49:23.310000-01:00
2 2,03804E+18 2021-11-02 09:49:24.310 58.501630 2021-11-02 1900-01-01 09:49:24.310 2021 11 9 2021-11-02 09:49:24.310000-01:00
3 2,03804E+18 2021-11-02 09:49:25.310 57.684305 2021-11-02 1900-01-01 09:49:25.310 2021 11 9 2021-11-02 09:49:25.310000-01:00
4 2,03804E+18 2021-11-02 09:49:26.310 57.416267 2021-11-02 1900-01-01 09:49:26.310 2021 11 9 2021-11-02 09:49:26.310000-01:00
... ... ... ... ... ... ... ... ... ...
1048570 2,03804E+18 2021-11-02 09:49:06.610 58.866556 2021-11-02 1900-01-01 09:49:06.610 2021 11 9 2021-11-02 09:49:06.610000-01:00
1048571 2,03804E+18 2021-11-02 09:49:07.610 57.896402 2021-11-02 1900-01-01 09:49:07.610 2021 11 9 2021-11-02 09:49:07.610000-01:00
1048572 2,03804E+18 2021-11-02 09:49:08.610 58.556143 2021-11-02 1900-01-01 09:49:08.610 2021 11 9 2021-11-02 09:49:08.610000-01:00
1048573 2,03804E+18 2021-11-02 09:49:09.610 58.915390 2021-11-02 1900-01-01 09:49:09.610 2021 11 9 2021-11-02 09:49:09.610000-01:00
1048574 2,03804E+18 2021-11-02 09:49:10.610 59.383412 2021-11-02 1900-01-01 09:49:10.610 2021 11 9 2021-11-02 09:49:10.610000-01:00

1048575 rows × 9 columns

3) Data Conversions (Transform)¶

To be able to publish the Vinotion and Sorama data to a data space, the data sets will need to be converted into the Smart Data Models format by FIWARE. These models are standardised models created by FIWARE. This conversion is MIM2 (Minimal Interoperable Mechansim 2) - data models and sharing. With this conversion, an API can then be used (MIM1) to publish the data to a data space.

The conversion of the Vinotion will use the TrafficFlowObserved SMD model : https://github.com/smart-data-models/dataModel.Transportation/blob/master/TrafficFlowObserved/doc/spec.md

The conversion of thE Sorama data will use the NoiseLevelObserved SMD model : https://github.com/smart-data-models/dataModel.Environment/blob/master/NoiseLevelObserved/doc/spec.md

Figure 2: Data Model conversion of Inzicht Verlicht Data to Smart Data Model

Data Spaces ERD

3.1) Vinotion Conversion¶

In [ ]:
#Print Columns
vinotion_columns = df_vinotion_full.columns
print(vinotion_columns)
Index(['datetime_short', 'cameraId', 'classification', 'ruleId', 'speed',
       'uuid'],
      dtype='object')

Selecting the necessary columns from the original CSV file.

In [ ]:
# Select specific columns
df_vinotion_SMD = df_vinotion_full[['datetime_short','cameraId', 'classification', 'speed', 'uuid']].copy()
df_vinotion_SMD
Out[ ]:
datetime_short cameraId classification speed uuid
0 2021-11-02:09:51:39 wss://10.252.229.12/api/v2/data/main/object Car 54.49428 be698d9d-35dd-4588-8f1a-149c116c3973
1 2021-11-02:09:51:40 wss://10.252.229.12/api/v2/data/main/object Car 50.97996 91a1416c-c3f3-4498-9dae-8d32c0ae7457
2 2021-11-02:09:51:41 wss://10.252.229.12/api/v2/data/main/object Car 55.75644 ac92fd4b-807f-4ce9-ac2e-7b9ab0b8fb78
3 2021-11-02:09:51:42 wss://10.252.229.12/api/v2/data/main/object Car 51.20208 bf1a1528-1fe7-4558-b1e0-9f59fc5d4a88
4 2021-11-02:09:51:43 wss://10.252.229.12/api/v2/data/main/object Car 60.90840 c8be14e9-c24a-4542-bbc5-5e389d0a6034
... ... ... ... ... ...
2370144 2021-11-30:22:59:51 wss://10.252.229.14/api/v2/data/main/object Car 38.90988 c24f5be9-2474-4b80-b54c-766398f462ec
2370145 2021-11-30:22:59:51 wss://10.252.229.13/api/v2/data/main/object Car 44.01072 2ced7471-922d-4e35-922c-e854acf891c7
2370146 2021-11-30:22:59:51 wss://10.252.229.13/api/v2/data/main/object Car 40.30704 7bdb8752-fd32-4cb2-b394-e7e900b95fd5
2370147 2021-11-30:22:59:56 wss://10.252.229.12/api/v2/data/main/object Car 71.63712 6667479f-1b76-4a9b-905c-5387edcf16fd
2370148 2021-11-30:22:59:57 wss://10.252.229.12/api/v2/data/main/object Car 60.88536 9ce75489-b1af-4d91-8810-16dddff967f1

2370149 rows × 5 columns

3.1.1) Aggregating Vinotion data¶

Aggregating the data to one minute intervals, since the Smart Data Models have the averageSpeed and dateObservedFrom and DateObservedTo columns. One minute intervals were used as discussed with Eindhoven Municipality and thats what they prefered.

In [ ]:
# Convert 'dateObservedFrom' and 'dateObservedTo' columns to datetime type
df_vinotion_SMD['datetime_short'] = pd.to_datetime(df_vinotion_SMD['datetime_short'], format='%Y-%m-%d:%H:%M:%S')
df_vinotion_SMD
Out[ ]:
datetime_short cameraId classification speed uuid
0 2021-11-02 09:51:39 wss://10.252.229.12/api/v2/data/main/object Car 54.49428 be698d9d-35dd-4588-8f1a-149c116c3973
1 2021-11-02 09:51:40 wss://10.252.229.12/api/v2/data/main/object Car 50.97996 91a1416c-c3f3-4498-9dae-8d32c0ae7457
2 2021-11-02 09:51:41 wss://10.252.229.12/api/v2/data/main/object Car 55.75644 ac92fd4b-807f-4ce9-ac2e-7b9ab0b8fb78
3 2021-11-02 09:51:42 wss://10.252.229.12/api/v2/data/main/object Car 51.20208 bf1a1528-1fe7-4558-b1e0-9f59fc5d4a88
4 2021-11-02 09:51:43 wss://10.252.229.12/api/v2/data/main/object Car 60.90840 c8be14e9-c24a-4542-bbc5-5e389d0a6034
... ... ... ... ... ...
2370144 2021-11-30 22:59:51 wss://10.252.229.14/api/v2/data/main/object Car 38.90988 c24f5be9-2474-4b80-b54c-766398f462ec
2370145 2021-11-30 22:59:51 wss://10.252.229.13/api/v2/data/main/object Car 44.01072 2ced7471-922d-4e35-922c-e854acf891c7
2370146 2021-11-30 22:59:51 wss://10.252.229.13/api/v2/data/main/object Car 40.30704 7bdb8752-fd32-4cb2-b394-e7e900b95fd5
2370147 2021-11-30 22:59:56 wss://10.252.229.12/api/v2/data/main/object Car 71.63712 6667479f-1b76-4a9b-905c-5387edcf16fd
2370148 2021-11-30 22:59:57 wss://10.252.229.12/api/v2/data/main/object Car 60.88536 9ce75489-b1af-4d91-8810-16dddff967f1

2370149 rows × 5 columns

In [ ]:
# Set datetime_short as the index
df_vinotion_SMD.set_index('datetime_short', inplace=True)
df_vinotion_SMD
Out[ ]:
cameraId classification speed uuid
datetime_short
2021-11-02 09:51:39 wss://10.252.229.12/api/v2/data/main/object Car 54.49428 be698d9d-35dd-4588-8f1a-149c116c3973
2021-11-02 09:51:40 wss://10.252.229.12/api/v2/data/main/object Car 50.97996 91a1416c-c3f3-4498-9dae-8d32c0ae7457
2021-11-02 09:51:41 wss://10.252.229.12/api/v2/data/main/object Car 55.75644 ac92fd4b-807f-4ce9-ac2e-7b9ab0b8fb78
2021-11-02 09:51:42 wss://10.252.229.12/api/v2/data/main/object Car 51.20208 bf1a1528-1fe7-4558-b1e0-9f59fc5d4a88
2021-11-02 09:51:43 wss://10.252.229.12/api/v2/data/main/object Car 60.90840 c8be14e9-c24a-4542-bbc5-5e389d0a6034
... ... ... ... ...
2021-11-30 22:59:51 wss://10.252.229.14/api/v2/data/main/object Car 38.90988 c24f5be9-2474-4b80-b54c-766398f462ec
2021-11-30 22:59:51 wss://10.252.229.13/api/v2/data/main/object Car 44.01072 2ced7471-922d-4e35-922c-e854acf891c7
2021-11-30 22:59:51 wss://10.252.229.13/api/v2/data/main/object Car 40.30704 7bdb8752-fd32-4cb2-b394-e7e900b95fd5
2021-11-30 22:59:56 wss://10.252.229.12/api/v2/data/main/object Car 71.63712 6667479f-1b76-4a9b-905c-5387edcf16fd
2021-11-30 22:59:57 wss://10.252.229.12/api/v2/data/main/object Car 60.88536 9ce75489-b1af-4d91-8810-16dddff967f1

2370149 rows × 4 columns

Grouping by the camera ID, and datetime_short, getting the average speed and combining the multiple values for the vehciles observed in the classification.

In [ ]:
# Resample to one-minute intervals and aggregate by cameraId
df_vinotion_SMD = df_vinotion_SMD.groupby('cameraId').resample('1T').agg({
    'classification': lambda x: list(x),  # Collect all classifications into a list
    'speed': 'mean',  # Calculate the average speed
    'uuid': lambda x: list(x)  # Collect all UUIDs into a list
}).copy()

df_vinotion_SMD
Out[ ]:
classification speed uuid
cameraId datetime_short
wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:51:00 [Car, Car, Car, Truck, Car, Car, Car, Truck] 45.112545 [030e5136-bcb1-4162-b686-41cc63667155, 2880199...
2021-11-02 09:52:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.883007 [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935...
2021-11-02 09:53:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 39.584312 [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6...
2021-11-02 09:54:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.379448 [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c...
2021-11-02 09:55:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 40.971777 [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff...
... ... ... ... ...
wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:55:00 [Car, Car] 56.446200 [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f...
2021-11-30 22:56:00 [Car, Car, Car] 58.041480 [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e...
2021-11-30 22:57:00 [] NaN []
2021-11-30 22:58:00 [Car] 54.285480 [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934]
2021-11-30 22:59:00 [Car] 54.156960 [522263fa-6df0-4280-afa6-365ffc0a552b]

205545 rows × 3 columns

Reset index to make datetime_short a column again

In [ ]:
df_vinotion_SMD.reset_index(inplace=True)
df_vinotion_SMD
Out[ ]:
cameraId datetime_short classification speed uuid
0 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:51:00 [Car, Car, Car, Truck, Car, Car, Car, Truck] 45.112545 [030e5136-bcb1-4162-b686-41cc63667155, 2880199...
1 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:52:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.883007 [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935...
2 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:53:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 39.584312 [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6...
3 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:54:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.379448 [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c...
4 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:55:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 40.971777 [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff...
... ... ... ... ... ...
205540 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:55:00 [Car, Car] 56.446200 [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f...
205541 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:56:00 [Car, Car, Car] 58.041480 [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e...
205542 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:57:00 [] NaN []
205543 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:58:00 [Car] 54.285480 [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934]
205544 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:59:00 [Car] 54.156960 [522263fa-6df0-4280-afa6-365ffc0a552b]

205545 rows × 5 columns

3.1.2) Adding columns¶

In this section the columns from the Smart Data Model : TrafficFlowObeserved are added, based on which columns could be added and what columns Eindhoven Municipality wanted to have in the conversion from the Smart Data Model. The names and types for each column also match those of the Smart Data Model.

In [ ]:
# Add a unique id column (starting from 1)
df_vinotion_SMD['id'] = range(1, len(df_vinotion_SMD) + 1)
df_vinotion_SMD
Out[ ]:
cameraId datetime_short classification speed uuid id
0 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:51:00 [Car, Car, Car, Truck, Car, Car, Car, Truck] 45.112545 [030e5136-bcb1-4162-b686-41cc63667155, 2880199... 1
1 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:52:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.883007 [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... 2
2 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:53:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 39.584312 [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... 3
3 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:54:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.379448 [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... 4
4 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:55:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 40.971777 [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... 5
... ... ... ... ... ... ...
205540 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:55:00 [Car, Car] 56.446200 [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... 205541
205541 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:56:00 [Car, Car, Car] 58.041480 [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... 205542
205542 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:57:00 [] NaN [] 205543
205543 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:58:00 [Car] 54.285480 [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] 205544
205544 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:59:00 [Car] 54.156960 [522263fa-6df0-4280-afa6-365ffc0a552b] 205545

205545 rows × 6 columns

Rename datetime_short column to dateObservedFrom.

In [ ]:
df_vinotion_SMD.rename(columns={'datetime_short': 'dateObservedFrom'}, inplace=True)
df_vinotion_SMD
Out[ ]:
cameraId dateObservedFrom classification speed uuid id
0 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:51:00 [Car, Car, Car, Truck, Car, Car, Car, Truck] 45.112545 [030e5136-bcb1-4162-b686-41cc63667155, 2880199... 1
1 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:52:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.883007 [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... 2
2 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:53:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 39.584312 [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... 3
3 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:54:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.379448 [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... 4
4 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:55:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 40.971777 [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... 5
... ... ... ... ... ... ...
205540 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:55:00 [Car, Car] 56.446200 [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... 205541
205541 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:56:00 [Car, Car, Car] 58.041480 [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... 205542
205542 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:57:00 [] NaN [] 205543
205543 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:58:00 [Car] 54.285480 [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] 205544
205544 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:59:00 [Car] 54.156960 [522263fa-6df0-4280-afa6-365ffc0a552b] 205545

205545 rows × 6 columns

Create dateObservedTo by shifting dateObservedFrom by one minute

In [ ]:
# Group by 'cameraId' and create 'dateObservedTo'
df_vinotion_SMD['dateObservedTo'] = df_vinotion_SMD.groupby('cameraId')['dateObservedFrom'].shift(-1)
df_vinotion_SMD
Out[ ]:
cameraId dateObservedFrom classification speed uuid id dateObservedTo
0 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:51:00 [Car, Car, Car, Truck, Car, Car, Car, Truck] 45.112545 [030e5136-bcb1-4162-b686-41cc63667155, 2880199... 1 2021-11-02 09:52:00
1 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:52:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.883007 [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... 2 2021-11-02 09:53:00
2 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:53:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 39.584312 [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... 3 2021-11-02 09:54:00
3 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:54:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.379448 [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... 4 2021-11-02 09:55:00
4 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:55:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 40.971777 [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... 5 2021-11-02 09:56:00
... ... ... ... ... ... ... ...
205540 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:55:00 [Car, Car] 56.446200 [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... 205541 2021-11-30 22:56:00
205541 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:56:00 [Car, Car, Car] 58.041480 [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... 205542 2021-11-30 22:57:00
205542 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:57:00 [] NaN [] 205543 2021-11-30 22:58:00
205543 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:58:00 [Car] 54.285480 [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] 205544 2021-11-30 22:59:00
205544 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:59:00 [Car] 54.156960 [522263fa-6df0-4280-afa6-365ffc0a552b] 205545 NaT

205545 rows × 7 columns

Adding the "type" column which is the type of Smart Data Model used, in this case its "TrafficFlowObserved".

In [ ]:
#TrafficFlowObserved
df_vinotion_SMD['type'] = 'TrafficFlowObserved'
df_vinotion_SMD
Out[ ]:
cameraId dateObservedFrom classification speed uuid id dateObservedTo type
0 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:51:00 [Car, Car, Car, Truck, Car, Car, Car, Truck] 45.112545 [030e5136-bcb1-4162-b686-41cc63667155, 2880199... 1 2021-11-02 09:52:00 TrafficFlowObserved
1 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:52:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.883007 [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... 2 2021-11-02 09:53:00 TrafficFlowObserved
2 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:53:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 39.584312 [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... 3 2021-11-02 09:54:00 TrafficFlowObserved
3 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:54:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.379448 [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... 4 2021-11-02 09:55:00 TrafficFlowObserved
4 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:55:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 40.971777 [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... 5 2021-11-02 09:56:00 TrafficFlowObserved
... ... ... ... ... ... ... ... ...
205540 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:55:00 [Car, Car] 56.446200 [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... 205541 2021-11-30 22:56:00 TrafficFlowObserved
205541 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:56:00 [Car, Car, Car] 58.041480 [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... 205542 2021-11-30 22:57:00 TrafficFlowObserved
205542 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:57:00 [] NaN [] 205543 2021-11-30 22:58:00 TrafficFlowObserved
205543 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:58:00 [Car] 54.285480 [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] 205544 2021-11-30 22:59:00 TrafficFlowObserved
205544 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:59:00 [Car] 54.156960 [522263fa-6df0-4280-afa6-365ffc0a552b] 205545 NaT TrafficFlowObserved

205545 rows × 8 columns

Renaming more columns to match the Smart Data Models.

In [ ]:
#renaming columns
df_vinotion_SMD = df_vinotion_SMD.rename(columns={'cameraId': 'source', 'classification':'vehicleType', 'speed':'averageVehicleSpeed','uuid':'name'})
df_vinotion_SMD
Out[ ]:
source dateObservedFrom vehicleType averageVehicleSpeed name id dateObservedTo type
0 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:51:00 [Car, Car, Car, Truck, Car, Car, Car, Truck] 45.112545 [030e5136-bcb1-4162-b686-41cc63667155, 2880199... 1 2021-11-02 09:52:00 TrafficFlowObserved
1 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:52:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.883007 [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... 2 2021-11-02 09:53:00 TrafficFlowObserved
2 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:53:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 39.584312 [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... 3 2021-11-02 09:54:00 TrafficFlowObserved
3 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:54:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.379448 [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... 4 2021-11-02 09:55:00 TrafficFlowObserved
4 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:55:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 40.971777 [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... 5 2021-11-02 09:56:00 TrafficFlowObserved
... ... ... ... ... ... ... ... ...
205540 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:55:00 [Car, Car] 56.446200 [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... 205541 2021-11-30 22:56:00 TrafficFlowObserved
205541 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:56:00 [Car, Car, Car] 58.041480 [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... 205542 2021-11-30 22:57:00 TrafficFlowObserved
205542 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:57:00 [] NaN [] 205543 2021-11-30 22:58:00 TrafficFlowObserved
205543 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:58:00 [Car] 54.285480 [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] 205544 2021-11-30 22:59:00 TrafficFlowObserved
205544 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:59:00 [Car] 54.156960 [522263fa-6df0-4280-afa6-365ffc0a552b] 205545 NaT TrafficFlowObserved

205545 rows × 8 columns

Adding dateCreating and dateModeified columns to enrich the data by giving information about when the data was created and when it was modified. This is set to the current time and data, which can be modified.

In [ ]:
# Add dateCreated and dateModified columns
df_vinotion_SMD['dateCreated'] = pd.Timestamp.now()
df_vinotion_SMD['dateModified'] = pd.Timestamp.now()
df_vinotion_SMD
Out[ ]:
source dateObservedFrom vehicleType averageVehicleSpeed name id dateObservedTo type dateCreated dateModified
0 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:51:00 [Car, Car, Car, Truck, Car, Car, Car, Truck] 45.112545 [030e5136-bcb1-4162-b686-41cc63667155, 2880199... 1 2021-11-02 09:52:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
1 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:52:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.883007 [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... 2 2021-11-02 09:53:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
2 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:53:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 39.584312 [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... 3 2021-11-02 09:54:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
3 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:54:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.379448 [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... 4 2021-11-02 09:55:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
4 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:55:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 40.971777 [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... 5 2021-11-02 09:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
... ... ... ... ... ... ... ... ... ... ...
205540 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:55:00 [Car, Car] 56.446200 [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... 205541 2021-11-30 22:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
205541 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:56:00 [Car, Car, Car] 58.041480 [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... 205542 2021-11-30 22:57:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
205542 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:57:00 [] NaN [] 205543 2021-11-30 22:58:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
205543 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:58:00 [Car] 54.285480 [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] 205544 2021-11-30 22:59:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
205544 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:59:00 [Car] 54.156960 [522263fa-6df0-4280-afa6-365ffc0a552b] 205545 NaT TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993

205545 rows × 10 columns

Calculate intensity: Total number of vehicles detected during this observation period.

In [ ]:
df_vinotion_SMD['intensity'] = df_vinotion_SMD['vehicleType'].apply(len)
df_vinotion_SMD
Out[ ]:
source dateObservedFrom vehicleType averageVehicleSpeed name id dateObservedTo type dateCreated dateModified intensity
0 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:51:00 [Car, Car, Car, Truck, Car, Car, Car, Truck] 45.112545 [030e5136-bcb1-4162-b686-41cc63667155, 2880199... 1 2021-11-02 09:52:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 8
1 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:52:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.883007 [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... 2 2021-11-02 09:53:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 24
2 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:53:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 39.584312 [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... 3 2021-11-02 09:54:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 28
3 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:54:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.379448 [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... 4 2021-11-02 09:55:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 31
4 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:55:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 40.971777 [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... 5 2021-11-02 09:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 23
... ... ... ... ... ... ... ... ... ... ... ...
205540 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:55:00 [Car, Car] 56.446200 [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... 205541 2021-11-30 22:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 2
205541 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:56:00 [Car, Car, Car] 58.041480 [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... 205542 2021-11-30 22:57:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 3
205542 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:57:00 [] NaN [] 205543 2021-11-30 22:58:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 0
205543 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:58:00 [Car] 54.285480 [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] 205544 2021-11-30 22:59:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1
205544 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:59:00 [Car] 54.156960 [522263fa-6df0-4280-afa6-365ffc0a552b] 205545 NaT TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1

205545 rows × 11 columns

Adding the dataProvider column which in this case is "Vinotion".

In [ ]:
df_vinotion_SMD['dataProvider'] = 'Vinotion'
df_vinotion_SMD
Out[ ]:
source dateObservedFrom vehicleType averageVehicleSpeed name id dateObservedTo type dateCreated dateModified intensity dataProvider
0 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:51:00 [Car, Car, Car, Truck, Car, Car, Car, Truck] 45.112545 [030e5136-bcb1-4162-b686-41cc63667155, 2880199... 1 2021-11-02 09:52:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 8 Vinotion
1 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:52:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.883007 [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... 2 2021-11-02 09:53:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 24 Vinotion
2 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:53:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 39.584312 [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... 3 2021-11-02 09:54:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 28 Vinotion
3 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:54:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.379448 [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... 4 2021-11-02 09:55:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 31 Vinotion
4 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:55:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 40.971777 [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... 5 2021-11-02 09:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 23 Vinotion
... ... ... ... ... ... ... ... ... ... ... ... ...
205540 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:55:00 [Car, Car] 56.446200 [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... 205541 2021-11-30 22:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 2 Vinotion
205541 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:56:00 [Car, Car, Car] 58.041480 [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... 205542 2021-11-30 22:57:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 3 Vinotion
205542 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:57:00 [] NaN [] 205543 2021-11-30 22:58:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 0 Vinotion
205543 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:58:00 [Car] 54.285480 [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] 205544 2021-11-30 22:59:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1 Vinotion
205544 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:59:00 [Car] 54.156960 [522263fa-6df0-4280-afa6-365ffc0a552b] 205545 NaT TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1 Vinotion

205545 rows × 12 columns

Calculating the averageHeadwayTime column which shows the average time in seconds between each vehicle. Indicating how far each vehcile is from one another. You can consider the two second rule, which is how many seconds you should be behind the vehicle in front of you.

Considering that the average is caluclated by finding the time difference between each vehicle, the average is calculated by dividing the total average time by the number of vehicles minus 1 since we want to know the average time difference between each headway and not all the vehicles (known as the fence post error).

In [ ]:
df_vinotion_SMD['averageHeadwayTime'] = df_vinotion_SMD.groupby('source')['dateObservedFrom'].diff().dt.total_seconds().shift(-1) / (df_vinotion_SMD['intensity'] -1)

# Replace infinite values and negative values with NaN.
df_vinotion_SMD['averageHeadwayTime'].replace([np.inf, -np.inf], np.nan, inplace=True)
df_vinotion_SMD.loc[df_vinotion_SMD['averageHeadwayTime'] < 0, 'averageHeadwayTime'] = np.nan

df_vinotion_SMD
Out[ ]:
source dateObservedFrom vehicleType averageVehicleSpeed name id dateObservedTo type dateCreated dateModified intensity dataProvider averageHeadwayTime
0 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:51:00 [Car, Car, Car, Truck, Car, Car, Car, Truck] 45.112545 [030e5136-bcb1-4162-b686-41cc63667155, 2880199... 1 2021-11-02 09:52:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 8 Vinotion 8.571429
1 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:52:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.883007 [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... 2 2021-11-02 09:53:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 24 Vinotion 2.608696
2 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:53:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 39.584312 [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... 3 2021-11-02 09:54:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 28 Vinotion 2.222222
3 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:54:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.379448 [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... 4 2021-11-02 09:55:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 31 Vinotion 2.000000
4 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:55:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 40.971777 [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... 5 2021-11-02 09:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 23 Vinotion 2.727273
... ... ... ... ... ... ... ... ... ... ... ... ... ...
205540 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:55:00 [Car, Car] 56.446200 [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... 205541 2021-11-30 22:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 2 Vinotion 60.000000
205541 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:56:00 [Car, Car, Car] 58.041480 [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... 205542 2021-11-30 22:57:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 3 Vinotion 30.000000
205542 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:57:00 [] NaN [] 205543 2021-11-30 22:58:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 0 Vinotion NaN
205543 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:58:00 [Car] 54.285480 [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] 205544 2021-11-30 22:59:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1 Vinotion NaN
205544 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:59:00 [Car] 54.156960 [522263fa-6df0-4280-afa6-365ffc0a552b] 205545 NaT TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1 Vinotion NaN

205545 rows × 13 columns

Next the "congested" boolean column needs to be made. This is done by calulating the average speed in the last hour, and the speed decrease.

In [ ]:
# Calculate average speed of the last hour
df_vinotion_SMD['avg_speed_last_hour'] = df_vinotion_SMD.groupby('source')['averageVehicleSpeed'].rolling(window=60).mean().reset_index(level=0, drop=True)
df_vinotion_SMD
Out[ ]:
source dateObservedFrom vehicleType averageVehicleSpeed name id dateObservedTo type dateCreated dateModified intensity dataProvider averageHeadwayTime avg_speed_last_hour
0 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:51:00 [Car, Car, Car, Truck, Car, Car, Car, Truck] 45.112545 [030e5136-bcb1-4162-b686-41cc63667155, 2880199... 1 2021-11-02 09:52:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 8 Vinotion 8.571429 NaN
1 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:52:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.883007 [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... 2 2021-11-02 09:53:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 24 Vinotion 2.608696 NaN
2 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:53:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 39.584312 [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... 3 2021-11-02 09:54:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 28 Vinotion 2.222222 NaN
3 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:54:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.379448 [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... 4 2021-11-02 09:55:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 31 Vinotion 2.000000 NaN
4 wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:55:00 [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 40.971777 [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... 5 2021-11-02 09:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 23 Vinotion 2.727273 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
205540 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:55:00 [Car, Car] 56.446200 [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... 205541 2021-11-30 22:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 2 Vinotion 60.000000 NaN
205541 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:56:00 [Car, Car, Car] 58.041480 [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... 205542 2021-11-30 22:57:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 3 Vinotion 30.000000 NaN
205542 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:57:00 [] NaN [] 205543 2021-11-30 22:58:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 0 Vinotion NaN NaN
205543 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:58:00 [Car] 54.285480 [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] 205544 2021-11-30 22:59:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1 Vinotion NaN NaN
205544 wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:59:00 [Car] 54.156960 [522263fa-6df0-4280-afa6-365ffc0a552b] 205545 NaT TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1 Vinotion NaN NaN

205545 rows × 14 columns

In [ ]:
#Grouped the data by id.
#Using rolling(window='3600s') to define a rolling window of 3600 seconds (1 hour).
#Calculating the mean (.mean()) of averageVehicleSpeed within each rolling window.
#Use .reset_index(level=0, drop=True) to remove the additional index created by the rolling operation and keep the original grouping intact.


df_vinotion_SMD.set_index('dateObservedFrom', inplace=True)
df_vinotion_SMD['avg_speed_last_hour'] = df_vinotion_SMD.groupby('source')['averageVehicleSpeed'].rolling(window='3600s').mean().reset_index(level=0, drop=True)
df_vinotion_SMD.reset_index(inplace=True)

df_vinotion_SMD
Out[ ]:
dateObservedFrom source vehicleType averageVehicleSpeed name id dateObservedTo type dateCreated dateModified intensity dataProvider averageHeadwayTime avg_speed_last_hour
0 2021-11-02 09:51:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Truck, Car, Car, Car, Truck] 45.112545 [030e5136-bcb1-4162-b686-41cc63667155, 2880199... 1 2021-11-02 09:52:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 8 Vinotion 8.571429 45.112545
1 2021-11-02 09:52:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.883007 [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... 2 2021-11-02 09:53:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 24 Vinotion 2.608696 41.997776
2 2021-11-02 09:53:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 39.584312 [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... 3 2021-11-02 09:54:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 28 Vinotion 2.222222 41.193288
3 2021-11-02 09:54:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.379448 [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... 4 2021-11-02 09:55:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 31 Vinotion 2.000000 40.489828
4 2021-11-02 09:55:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 40.971777 [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... 5 2021-11-02 09:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 23 Vinotion 2.727273 40.586218
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
205540 2021-11-30 22:55:00 wss://10.252.229.15/api/v2/data/main/object [Car, Car] 56.446200 [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... 205541 2021-11-30 22:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 2 Vinotion 60.000000 52.453460
205541 2021-11-30 22:56:00 wss://10.252.229.15/api/v2/data/main/object [Car, Car, Car] 58.041480 [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... 205542 2021-11-30 22:57:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 3 Vinotion 30.000000 52.553246
205542 2021-11-30 22:57:00 wss://10.252.229.15/api/v2/data/main/object [] NaN [] 205543 2021-11-30 22:58:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 0 Vinotion NaN 52.526431
205543 2021-11-30 22:58:00 wss://10.252.229.15/api/v2/data/main/object [Car] 54.285480 [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] 205544 2021-11-30 22:59:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1 Vinotion NaN 52.611206
205544 2021-11-30 22:59:00 wss://10.252.229.15/api/v2/data/main/object [Car] 54.156960 [522263fa-6df0-4280-afa6-365ffc0a552b] 205545 NaT TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1 Vinotion NaN 52.698673

205545 rows × 14 columns

In [ ]:
# Calculate speed decrease compared to the average speed of the last hour
df_vinotion_SMD['speed_decrease'] = (df_vinotion_SMD['averageVehicleSpeed'] - df_vinotion_SMD['avg_speed_last_hour']) / df_vinotion_SMD['avg_speed_last_hour']
df_vinotion_SMD
Out[ ]:
dateObservedFrom source vehicleType averageVehicleSpeed name id dateObservedTo type dateCreated dateModified intensity dataProvider averageHeadwayTime avg_speed_last_hour speed_decrease
0 2021-11-02 09:51:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Truck, Car, Car, Car, Truck] 45.112545 [030e5136-bcb1-4162-b686-41cc63667155, 2880199... 1 2021-11-02 09:52:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 8 Vinotion 8.571429 45.112545 0.000000
1 2021-11-02 09:52:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.883007 [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... 2 2021-11-02 09:53:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 24 Vinotion 2.608696 41.997776 -0.074165
2 2021-11-02 09:53:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 39.584312 [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... 3 2021-11-02 09:54:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 28 Vinotion 2.222222 41.193288 -0.039059
3 2021-11-02 09:54:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.379448 [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... 4 2021-11-02 09:55:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 31 Vinotion 2.000000 40.489828 -0.052121
4 2021-11-02 09:55:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 40.971777 [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... 5 2021-11-02 09:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 23 Vinotion 2.727273 40.586218 0.009500
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
205540 2021-11-30 22:55:00 wss://10.252.229.15/api/v2/data/main/object [Car, Car] 56.446200 [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... 205541 2021-11-30 22:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 2 Vinotion 60.000000 52.453460 0.076120
205541 2021-11-30 22:56:00 wss://10.252.229.15/api/v2/data/main/object [Car, Car, Car] 58.041480 [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... 205542 2021-11-30 22:57:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 3 Vinotion 30.000000 52.553246 0.104432
205542 2021-11-30 22:57:00 wss://10.252.229.15/api/v2/data/main/object [] NaN [] 205543 2021-11-30 22:58:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 0 Vinotion NaN 52.526431 NaN
205543 2021-11-30 22:58:00 wss://10.252.229.15/api/v2/data/main/object [Car] 54.285480 [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] 205544 2021-11-30 22:59:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1 Vinotion NaN 52.611206 0.031824
205544 2021-11-30 22:59:00 wss://10.252.229.15/api/v2/data/main/object [Car] 54.156960 [522263fa-6df0-4280-afa6-365ffc0a552b] 205545 NaT TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1 Vinotion NaN 52.698673 0.027672

205545 rows × 15 columns

In [ ]:
# Identify congested periods based on the defined threshold
congested_threshold = 0.2  # 20% speed decrease
congested_duration_threshold = 5  # 5 minutes
df_vinotion_SMD['congested'] = (df_vinotion_SMD['speed_decrease'] <= -congested_threshold).rolling(window=congested_duration_threshold).sum() >= congested_duration_threshold
df_vinotion_SMD
Out[ ]:
dateObservedFrom source vehicleType averageVehicleSpeed name id dateObservedTo type dateCreated dateModified intensity dataProvider averageHeadwayTime avg_speed_last_hour speed_decrease congested
0 2021-11-02 09:51:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Truck, Car, Car, Car, Truck] 45.112545 [030e5136-bcb1-4162-b686-41cc63667155, 2880199... 1 2021-11-02 09:52:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 8 Vinotion 8.571429 45.112545 0.000000 False
1 2021-11-02 09:52:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.883007 [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... 2 2021-11-02 09:53:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 24 Vinotion 2.608696 41.997776 -0.074165 False
2 2021-11-02 09:53:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 39.584312 [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... 3 2021-11-02 09:54:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 28 Vinotion 2.222222 41.193288 -0.039059 False
3 2021-11-02 09:54:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.379448 [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... 4 2021-11-02 09:55:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 31 Vinotion 2.000000 40.489828 -0.052121 False
4 2021-11-02 09:55:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 40.971777 [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... 5 2021-11-02 09:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 23 Vinotion 2.727273 40.586218 0.009500 False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
205540 2021-11-30 22:55:00 wss://10.252.229.15/api/v2/data/main/object [Car, Car] 56.446200 [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... 205541 2021-11-30 22:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 2 Vinotion 60.000000 52.453460 0.076120 False
205541 2021-11-30 22:56:00 wss://10.252.229.15/api/v2/data/main/object [Car, Car, Car] 58.041480 [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... 205542 2021-11-30 22:57:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 3 Vinotion 30.000000 52.553246 0.104432 False
205542 2021-11-30 22:57:00 wss://10.252.229.15/api/v2/data/main/object [] NaN [] 205543 2021-11-30 22:58:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 0 Vinotion NaN 52.526431 NaN False
205543 2021-11-30 22:58:00 wss://10.252.229.15/api/v2/data/main/object [Car] 54.285480 [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] 205544 2021-11-30 22:59:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1 Vinotion NaN 52.611206 0.031824 False
205544 2021-11-30 22:59:00 wss://10.252.229.15/api/v2/data/main/object [Car] 54.156960 [522263fa-6df0-4280-afa6-365ffc0a552b] 205545 NaT TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1 Vinotion NaN 52.698673 0.027672 False

205545 rows × 16 columns

In [ ]:
unique_values = df_vinotion_SMD['congested'].unique()
print(unique_values)
[False  True]
In [ ]:
# Count the number of True and False values in the 'congested' column
congested_counts = df_vinotion_SMD['congested'].value_counts()
print(congested_counts)
False    205402
True        143
Name: congested, dtype: int64

In the dataset there are 143 congested rows, and 205402 that are not congested, meaning majority are not congested.

In [ ]:
# Filter the DataFrame to display only rows where 'congested' is True
congested_true_df = df_vinotion_SMD[df_vinotion_SMD['congested'] == True]

# Display the filtered DataFrame
congested_true_df
Out[ ]:
dateObservedFrom source vehicleType averageVehicleSpeed name id dateObservedTo type dateCreated dateModified intensity dataProvider averageHeadwayTime avg_speed_last_hour speed_decrease congested
41495 2021-11-02 16:17:00 wss://10.252.229.12/api/v2/data/main/object [Car, Cycling, Car, Car, Car, Car, Car, Motorc... 32.308879 [2e43d99f-558a-4828-b432-4084c68dd52f, 3946c6f... 41496 2021-11-02 16:18:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 36 Vinotion 1.714286 51.197622 -0.368938 True
41501 2021-11-02 16:23:00 wss://10.252.229.12/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 30.933765 [8e05bf00-3258-4a9c-9b0d-855c8bb16fb9, bc59735... 41502 2021-11-02 16:24:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 38 Vinotion 1.621622 49.121578 -0.370261 True
41502 2021-11-02 16:24:00 wss://10.252.229.12/api/v2/data/main/object [Car, Motorcycling, Cycling, Car, Car, Car, Ca... 25.126082 [347ff0cc-de2c-47ef-a04d-546ee23ae335, 14fc51a... 41503 2021-11-02 16:25:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 36 Vinotion 1.714286 48.615450 -0.483167 True
41508 2021-11-02 16:30:00 wss://10.252.229.12/api/v2/data/main/object [Car, Cycling, Cycling, Car, Cycling, Car, Wal... 29.997070 [9e88f046-9b9d-416f-8aa5-e48eeb26f6dc, 76be04b... 41509 2021-11-02 16:31:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 37 Vinotion 1.666667 45.732962 -0.344082 True
41509 2021-11-02 16:31:00 wss://10.252.229.12/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 18.754191 [010ef025-89b8-489b-aba4-6d6fa21f8d83, ca15e18... 41510 2021-11-02 16:32:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 34 Vinotion 1.818182 45.049178 -0.583695 True
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
180668 2021-11-13 16:23:00 wss://10.252.229.15/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Cycling, C... 27.225098 [8d594d33-1d73-4d48-b693-883d3e0a7e95, 9b9097a... 180669 2021-11-13 16:24:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 11 Vinotion 6.000000 42.689133 -0.362248 True
180676 2021-11-13 16:31:00 wss://10.252.229.15/api/v2/data/main/object [Cycling, Car, Car, Car, Car, Car, Car, Car, C... 28.946984 [504ed4d6-7aaa-430d-a6e7-f337bffb4cf4, c50d9b6... 180677 2021-11-13 16:32:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 10 Vinotion 6.666667 42.147019 -0.313190 True
180677 2021-11-13 16:32:00 wss://10.252.229.15/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Cycling, C... 31.796481 [223fb515-f7f3-4651-9829-7f198ca6f784, a985b97... 180678 2021-11-13 16:33:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 19 Vinotion 3.333333 41.844839 -0.240134 True
180683 2021-11-13 16:38:00 wss://10.252.229.15/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car] 29.953820 [d6d45086-0f16-4b59-ad40-eed3d821c6a0, 6c77827... 180684 2021-11-13 16:39:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 9 Vinotion 7.500000 39.294381 -0.237707 True
180684 2021-11-13 16:39:00 wss://10.252.229.15/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 31.171457 [8ba05ce3-db6b-4db6-8c4a-3d267cb03f3b, e10b88a... 180685 2021-11-13 16:40:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 10 Vinotion 6.666667 38.980352 -0.200329 True

143 rows × 16 columns

Drop the columns 'avg_speed_last_hour' and 'speed_decrease' since they are not needed in the SMD model.

In [ ]:
df_vinotion_SMD = df_vinotion_SMD.drop(columns=['avg_speed_last_hour', 'speed_decrease'])
df_vinotion_SMD
Out[ ]:
dateObservedFrom source vehicleType averageVehicleSpeed name id dateObservedTo type dateCreated dateModified intensity dataProvider averageHeadwayTime congested
0 2021-11-02 09:51:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Truck, Car, Car, Car, Truck] 45.112545 [030e5136-bcb1-4162-b686-41cc63667155, 2880199... 1 2021-11-02 09:52:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 8 Vinotion 8.571429 False
1 2021-11-02 09:52:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.883007 [05fcbbf5-29e8-440e-a243-409740d64788, b0b1935... 2 2021-11-02 09:53:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 24 Vinotion 2.608696 False
2 2021-11-02 09:53:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 39.584312 [de667d19-4473-474f-ad89-f674d8a73e88, b28dfb6... 3 2021-11-02 09:54:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 28 Vinotion 2.222222 False
3 2021-11-02 09:54:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 38.379448 [33f2ad13-292c-47df-8cbe-3e948f11d534, a855d4c... 4 2021-11-02 09:55:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 31 Vinotion 2.000000 False
4 2021-11-02 09:55:00 wss://10.252.229.11/api/v2/data/main/object [Car, Car, Car, Car, Car, Car, Car, Car, Car, ... 40.971777 [6dbbbee2-dc5d-4818-a30a-c68203ebb944, b7e7eff... 5 2021-11-02 09:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 23 Vinotion 2.727273 False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
205540 2021-11-30 22:55:00 wss://10.252.229.15/api/v2/data/main/object [Car, Car] 56.446200 [9c7f1eed-9c40-46f1-95f0-27f79706dfeb, d7f225f... 205541 2021-11-30 22:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 2 Vinotion 60.000000 False
205541 2021-11-30 22:56:00 wss://10.252.229.15/api/v2/data/main/object [Car, Car, Car] 58.041480 [f6ab0d5f-2d85-4a19-bf20-baec498210d6, 67a4a3e... 205542 2021-11-30 22:57:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 3 Vinotion 30.000000 False
205542 2021-11-30 22:57:00 wss://10.252.229.15/api/v2/data/main/object [] NaN [] 205543 2021-11-30 22:58:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 0 Vinotion NaN False
205543 2021-11-30 22:58:00 wss://10.252.229.15/api/v2/data/main/object [Car] 54.285480 [b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934] 205544 2021-11-30 22:59:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1 Vinotion NaN False
205544 2021-11-30 22:59:00 wss://10.252.229.15/api/v2/data/main/object [Car] 54.156960 [522263fa-6df0-4280-afa6-365ffc0a552b] 205545 NaT TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1 Vinotion NaN False

205545 rows × 14 columns

Converting columns to strings and replacing empty lists with NaN values.

In [ ]:
# Convert the 'vehicleType' column to string
df_vinotion_SMD['vehicleType'] = df_vinotion_SMD['vehicleType'].astype(str)

# Convert the 'name' column to string
df_vinotion_SMD['name'] = df_vinotion_SMD['name'].astype(str)

# Replace empty lists of 'vehicleType' with NaN
df_vinotion_SMD['vehicleType'] = df_vinotion_SMD['vehicleType'].replace('[]', np.nan)

# Replace empty string of 'vehicleType' with NaN
df_vinotion_SMD['name'] = df_vinotion_SMD['name'].replace('[]', np.nan)


df_vinotion_SMD
Out[ ]:
dateObservedFrom source vehicleType averageVehicleSpeed name id dateObservedTo type dateCreated dateModified intensity dataProvider averageHeadwayTime congested
0 2021-11-02 09:51:00 wss://10.252.229.11/api/v2/data/main/object ['Car', 'Car', 'Car', 'Truck', 'Car', 'Car', '... 45.112545 ['030e5136-bcb1-4162-b686-41cc63667155', '2880... 1 2021-11-02 09:52:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 8 Vinotion 8.571429 False
1 2021-11-02 09:52:00 wss://10.252.229.11/api/v2/data/main/object ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... 38.883007 ['05fcbbf5-29e8-440e-a243-409740d64788', 'b0b1... 2 2021-11-02 09:53:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 24 Vinotion 2.608696 False
2 2021-11-02 09:53:00 wss://10.252.229.11/api/v2/data/main/object ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... 39.584312 ['de667d19-4473-474f-ad89-f674d8a73e88', 'b28d... 3 2021-11-02 09:54:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 28 Vinotion 2.222222 False
3 2021-11-02 09:54:00 wss://10.252.229.11/api/v2/data/main/object ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... 38.379448 ['33f2ad13-292c-47df-8cbe-3e948f11d534', 'a855... 4 2021-11-02 09:55:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 31 Vinotion 2.000000 False
4 2021-11-02 09:55:00 wss://10.252.229.11/api/v2/data/main/object ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... 40.971777 ['6dbbbee2-dc5d-4818-a30a-c68203ebb944', 'b7e7... 5 2021-11-02 09:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 23 Vinotion 2.727273 False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
205540 2021-11-30 22:55:00 wss://10.252.229.15/api/v2/data/main/object ['Car', 'Car'] 56.446200 ['9c7f1eed-9c40-46f1-95f0-27f79706dfeb', 'd7f2... 205541 2021-11-30 22:56:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 2 Vinotion 60.000000 False
205541 2021-11-30 22:56:00 wss://10.252.229.15/api/v2/data/main/object ['Car', 'Car', 'Car'] 58.041480 ['f6ab0d5f-2d85-4a19-bf20-baec498210d6', '67a4... 205542 2021-11-30 22:57:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 3 Vinotion 30.000000 False
205542 2021-11-30 22:57:00 wss://10.252.229.15/api/v2/data/main/object NaN NaN NaN 205543 2021-11-30 22:58:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 0 Vinotion NaN False
205543 2021-11-30 22:58:00 wss://10.252.229.15/api/v2/data/main/object ['Car'] 54.285480 ['b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934'] 205544 2021-11-30 22:59:00 TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1 Vinotion NaN False
205544 2021-11-30 22:59:00 wss://10.252.229.15/api/v2/data/main/object ['Car'] 54.156960 ['522263fa-6df0-4280-afa6-365ffc0a552b'] 205545 NaT TrafficFlowObserved 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993 1 Vinotion NaN False

205545 rows × 14 columns

Re-ordering columns to make it more readable.

In [ ]:
#re-ordering columns
df_vinotion_SMD = df_vinotion_SMD[['id','dataProvider','type','name','source','dateObservedFrom','dateObservedTo','vehicleType',
                                   'averageVehicleSpeed','intensity','averageHeadwayTime','congested','dateCreated','dateModified']]

df_vinotion_SMD
Out[ ]:
id dataProvider type name source dateObservedFrom dateObservedTo vehicleType averageVehicleSpeed intensity averageHeadwayTime congested dateCreated dateModified
0 1 Vinotion TrafficFlowObserved ['030e5136-bcb1-4162-b686-41cc63667155', '2880... wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:51:00 2021-11-02 09:52:00 ['Car', 'Car', 'Car', 'Truck', 'Car', 'Car', '... 45.112545 8 8.571429 False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
1 2 Vinotion TrafficFlowObserved ['05fcbbf5-29e8-440e-a243-409740d64788', 'b0b1... wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:52:00 2021-11-02 09:53:00 ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... 38.883007 24 2.608696 False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
2 3 Vinotion TrafficFlowObserved ['de667d19-4473-474f-ad89-f674d8a73e88', 'b28d... wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:53:00 2021-11-02 09:54:00 ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... 39.584312 28 2.222222 False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
3 4 Vinotion TrafficFlowObserved ['33f2ad13-292c-47df-8cbe-3e948f11d534', 'a855... wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:54:00 2021-11-02 09:55:00 ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... 38.379448 31 2.000000 False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
4 5 Vinotion TrafficFlowObserved ['6dbbbee2-dc5d-4818-a30a-c68203ebb944', 'b7e7... wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:55:00 2021-11-02 09:56:00 ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... 40.971777 23 2.727273 False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
205540 205541 Vinotion TrafficFlowObserved ['9c7f1eed-9c40-46f1-95f0-27f79706dfeb', 'd7f2... wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:55:00 2021-11-30 22:56:00 ['Car', 'Car'] 56.446200 2 60.000000 False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
205541 205542 Vinotion TrafficFlowObserved ['f6ab0d5f-2d85-4a19-bf20-baec498210d6', '67a4... wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:56:00 2021-11-30 22:57:00 ['Car', 'Car', 'Car'] 58.041480 3 30.000000 False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
205542 205543 Vinotion TrafficFlowObserved NaN wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:57:00 2021-11-30 22:58:00 NaN NaN 0 NaN False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
205543 205544 Vinotion TrafficFlowObserved ['b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934'] wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:58:00 2021-11-30 22:59:00 ['Car'] 54.285480 1 NaN False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
205544 205545 Vinotion TrafficFlowObserved ['522263fa-6df0-4280-afa6-365ffc0a552b'] wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:59:00 NaT ['Car'] 54.156960 1 NaN False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993

205545 rows × 14 columns

In [ ]:
df_vinotion_SMD
Out[ ]:
id dataProvider type name source dateObservedFrom dateObservedTo vehicleType averageVehicleSpeed intensity averageHeadwayTime congested dateCreated dateModified
0 1 Vinotion TrafficFlowObserved ['030e5136-bcb1-4162-b686-41cc63667155', '2880... wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:51:00 2021-11-02 09:52:00 ['Car', 'Car', 'Car', 'Truck', 'Car', 'Car', '... 45.112545 8 8.571429 False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
1 2 Vinotion TrafficFlowObserved ['05fcbbf5-29e8-440e-a243-409740d64788', 'b0b1... wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:52:00 2021-11-02 09:53:00 ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... 38.883007 24 2.608696 False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
2 3 Vinotion TrafficFlowObserved ['de667d19-4473-474f-ad89-f674d8a73e88', 'b28d... wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:53:00 2021-11-02 09:54:00 ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... 39.584312 28 2.222222 False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
3 4 Vinotion TrafficFlowObserved ['33f2ad13-292c-47df-8cbe-3e948f11d534', 'a855... wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:54:00 2021-11-02 09:55:00 ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... 38.379448 31 2.000000 False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
4 5 Vinotion TrafficFlowObserved ['6dbbbee2-dc5d-4818-a30a-c68203ebb944', 'b7e7... wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:55:00 2021-11-02 09:56:00 ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... 40.971777 23 2.727273 False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
205540 205541 Vinotion TrafficFlowObserved ['9c7f1eed-9c40-46f1-95f0-27f79706dfeb', 'd7f2... wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:55:00 2021-11-30 22:56:00 ['Car', 'Car'] 56.446200 2 60.000000 False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
205541 205542 Vinotion TrafficFlowObserved ['f6ab0d5f-2d85-4a19-bf20-baec498210d6', '67a4... wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:56:00 2021-11-30 22:57:00 ['Car', 'Car', 'Car'] 58.041480 3 30.000000 False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
205542 205543 Vinotion TrafficFlowObserved NaN wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:57:00 2021-11-30 22:58:00 NaN NaN 0 NaN False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
205543 205544 Vinotion TrafficFlowObserved ['b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934'] wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:58:00 2021-11-30 22:59:00 ['Car'] 54.285480 1 NaN False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993
205544 205545 Vinotion TrafficFlowObserved ['522263fa-6df0-4280-afa6-365ffc0a552b'] wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:59:00 NaT ['Car'] 54.156960 1 NaN False 2024-07-11 12:45:57.473436 2024-07-11 12:45:57.475993

205545 rows × 14 columns

The data frame table above shows the end result of how the data that will be published for Vinotion would look like.

In [ ]:
df_vinotion_SMD.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205545 entries, 0 to 205544
Data columns (total 14 columns):
 #   Column               Non-Null Count   Dtype         
---  ------               --------------   -----         
 0   id                   205545 non-null  int64         
 1   dataProvider         205545 non-null  object        
 2   type                 205545 non-null  object        
 3   name                 176949 non-null  object        
 4   source               205545 non-null  object        
 5   dateObservedFrom     205545 non-null  datetime64[ns]
 6   dateObservedTo       205540 non-null  datetime64[ns]
 7   vehicleType          176949 non-null  object        
 8   averageVehicleSpeed  176949 non-null  float64       
 9   intensity            205545 non-null  int64         
 10  averageHeadwayTime   158418 non-null  float64       
 11  congested            205545 non-null  bool          
 12  dateCreated          205545 non-null  datetime64[ns]
 13  dateModified         205545 non-null  datetime64[ns]
dtypes: bool(1), datetime64[ns](4), float64(2), int64(2), object(5)
memory usage: 20.6+ MB

3.2) Sorama Conversion¶

Next is to convert the Sorama to the NoiseLevelObserved Smart Data Model. The data is aggregated inheritly to one second durations, therefore it is not required to aggregate the data more, although it can be aggregated to larger intervals, such as one minute intervals.

In [ ]:
df_sorama_full = pd.read_csv (r'/Users/maxwellernst/Documents/data/modified_data.csv', sep=',')
df_sorama_full
Out[ ]:
deviceId Content_time Content_value Date Time Year Month Hour Datetime
0 2,03804E+18 2021-11-02 09:49:22.310 57.077148 2021-11-02 1900-01-01 09:49:22.310 2021 11 9 2021-11-02 09:49:22.310000-01:00
1 2,03804E+18 2021-11-02 09:49:23.310 57.639920 2021-11-02 1900-01-01 09:49:23.310 2021 11 9 2021-11-02 09:49:23.310000-01:00
2 2,03804E+18 2021-11-02 09:49:24.310 58.501630 2021-11-02 1900-01-01 09:49:24.310 2021 11 9 2021-11-02 09:49:24.310000-01:00
3 2,03804E+18 2021-11-02 09:49:25.310 57.684305 2021-11-02 1900-01-01 09:49:25.310 2021 11 9 2021-11-02 09:49:25.310000-01:00
4 2,03804E+18 2021-11-02 09:49:26.310 57.416267 2021-11-02 1900-01-01 09:49:26.310 2021 11 9 2021-11-02 09:49:26.310000-01:00
... ... ... ... ... ... ... ... ... ...
1048570 2,03804E+18 2021-11-02 09:49:06.610 58.866556 2021-11-02 1900-01-01 09:49:06.610 2021 11 9 2021-11-02 09:49:06.610000-01:00
1048571 2,03804E+18 2021-11-02 09:49:07.610 57.896402 2021-11-02 1900-01-01 09:49:07.610 2021 11 9 2021-11-02 09:49:07.610000-01:00
1048572 2,03804E+18 2021-11-02 09:49:08.610 58.556143 2021-11-02 1900-01-01 09:49:08.610 2021 11 9 2021-11-02 09:49:08.610000-01:00
1048573 2,03804E+18 2021-11-02 09:49:09.610 58.915390 2021-11-02 1900-01-01 09:49:09.610 2021 11 9 2021-11-02 09:49:09.610000-01:00
1048574 2,03804E+18 2021-11-02 09:49:10.610 59.383412 2021-11-02 1900-01-01 09:49:10.610 2021 11 9 2021-11-02 09:49:10.610000-01:00

1048575 rows × 9 columns

Select specific columns and create new DataFrame.

In [ ]:
df_sorama_SMD = df_sorama_full[['deviceId', 'Content_time', 'Content_value']].copy()
df_sorama_SMD
Out[ ]:
deviceId Content_time Content_value
0 2,03804E+18 2021-11-02 09:49:22.310 57.077148
1 2,03804E+18 2021-11-02 09:49:23.310 57.639920
2 2,03804E+18 2021-11-02 09:49:24.310 58.501630
3 2,03804E+18 2021-11-02 09:49:25.310 57.684305
4 2,03804E+18 2021-11-02 09:49:26.310 57.416267
... ... ... ...
1048570 2,03804E+18 2021-11-02 09:49:06.610 58.866556
1048571 2,03804E+18 2021-11-02 09:49:07.610 57.896402
1048572 2,03804E+18 2021-11-02 09:49:08.610 58.556143
1048573 2,03804E+18 2021-11-02 09:49:09.610 58.915390
1048574 2,03804E+18 2021-11-02 09:49:10.610 59.383412

1048575 rows × 3 columns

Renaming datetime_short to dateObservedTo

In [ ]:
df_sorama_SMD['dateObservedTo'] = df_sorama_SMD['Content_time'].copy()
df_sorama_SMD
Out[ ]:
deviceId Content_time Content_value dateObservedTo
0 2,03804E+18 2021-11-02 09:49:22.310 57.077148 2021-11-02 09:49:22.310
1 2,03804E+18 2021-11-02 09:49:23.310 57.639920 2021-11-02 09:49:23.310
2 2,03804E+18 2021-11-02 09:49:24.310 58.501630 2021-11-02 09:49:24.310
3 2,03804E+18 2021-11-02 09:49:25.310 57.684305 2021-11-02 09:49:25.310
4 2,03804E+18 2021-11-02 09:49:26.310 57.416267 2021-11-02 09:49:26.310
... ... ... ... ...
1048570 2,03804E+18 2021-11-02 09:49:06.610 58.866556 2021-11-02 09:49:06.610
1048571 2,03804E+18 2021-11-02 09:49:07.610 57.896402 2021-11-02 09:49:07.610
1048572 2,03804E+18 2021-11-02 09:49:08.610 58.556143 2021-11-02 09:49:08.610
1048573 2,03804E+18 2021-11-02 09:49:09.610 58.915390 2021-11-02 09:49:09.610
1048574 2,03804E+18 2021-11-02 09:49:10.610 59.383412 2021-11-02 09:49:10.610

1048575 rows × 4 columns

Adding the "type" column which is "NoiseLevelObserved" Smart Data Model.

In [ ]:
#TrafficFlowObserved
df_sorama_SMD['type'] = 'NoiseLevelObserved'
df_sorama_SMD
Out[ ]:
deviceId Content_time Content_value dateObservedTo type
0 2,03804E+18 2021-11-02 09:49:22.310 57.077148 2021-11-02 09:49:22.310 NoiseLevelObserved
1 2,03804E+18 2021-11-02 09:49:23.310 57.639920 2021-11-02 09:49:23.310 NoiseLevelObserved
2 2,03804E+18 2021-11-02 09:49:24.310 58.501630 2021-11-02 09:49:24.310 NoiseLevelObserved
3 2,03804E+18 2021-11-02 09:49:25.310 57.684305 2021-11-02 09:49:25.310 NoiseLevelObserved
4 2,03804E+18 2021-11-02 09:49:26.310 57.416267 2021-11-02 09:49:26.310 NoiseLevelObserved
... ... ... ... ... ...
1048570 2,03804E+18 2021-11-02 09:49:06.610 58.866556 2021-11-02 09:49:06.610 NoiseLevelObserved
1048571 2,03804E+18 2021-11-02 09:49:07.610 57.896402 2021-11-02 09:49:07.610 NoiseLevelObserved
1048572 2,03804E+18 2021-11-02 09:49:08.610 58.556143 2021-11-02 09:49:08.610 NoiseLevelObserved
1048573 2,03804E+18 2021-11-02 09:49:09.610 58.915390 2021-11-02 09:49:09.610 NoiseLevelObserved
1048574 2,03804E+18 2021-11-02 09:49:10.610 59.383412 2021-11-02 09:49:10.610 NoiseLevelObserved

1048575 rows × 5 columns

Renaming columns to match the Smart Data Model.

In [ ]:
df_sorama_SMD = df_sorama_SMD.rename(columns={'deviceId': 'refDevice', 'Content_time': 'dateObservedFrom', 'Content_value':'LAeq'})
df_sorama_SMD
Out[ ]:
refDevice dateObservedFrom LAeq dateObservedTo type
0 2,03804E+18 2021-11-02 09:49:22.310 57.077148 2021-11-02 09:49:22.310 NoiseLevelObserved
1 2,03804E+18 2021-11-02 09:49:23.310 57.639920 2021-11-02 09:49:23.310 NoiseLevelObserved
2 2,03804E+18 2021-11-02 09:49:24.310 58.501630 2021-11-02 09:49:24.310 NoiseLevelObserved
3 2,03804E+18 2021-11-02 09:49:25.310 57.684305 2021-11-02 09:49:25.310 NoiseLevelObserved
4 2,03804E+18 2021-11-02 09:49:26.310 57.416267 2021-11-02 09:49:26.310 NoiseLevelObserved
... ... ... ... ... ...
1048570 2,03804E+18 2021-11-02 09:49:06.610 58.866556 2021-11-02 09:49:06.610 NoiseLevelObserved
1048571 2,03804E+18 2021-11-02 09:49:07.610 57.896402 2021-11-02 09:49:07.610 NoiseLevelObserved
1048572 2,03804E+18 2021-11-02 09:49:08.610 58.556143 2021-11-02 09:49:08.610 NoiseLevelObserved
1048573 2,03804E+18 2021-11-02 09:49:09.610 58.915390 2021-11-02 09:49:09.610 NoiseLevelObserved
1048574 2,03804E+18 2021-11-02 09:49:10.610 59.383412 2021-11-02 09:49:10.610 NoiseLevelObserved

1048575 rows × 5 columns

In [ ]:
# Add dateCreated and dateModified columns
df_sorama_SMD['dateCreated'] = pd.Timestamp.now()
df_sorama_SMD['dateModified'] = pd.Timestamp.now()
df_sorama_SMD
Out[ ]:
refDevice dateObservedFrom LAeq dateObservedTo type dateCreated dateModified
0 2,03804E+18 2021-11-02 09:49:22.310 57.077148 2021-11-02 09:49:22.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
1 2,03804E+18 2021-11-02 09:49:23.310 57.639920 2021-11-02 09:49:23.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
2 2,03804E+18 2021-11-02 09:49:24.310 58.501630 2021-11-02 09:49:24.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
3 2,03804E+18 2021-11-02 09:49:25.310 57.684305 2021-11-02 09:49:25.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
4 2,03804E+18 2021-11-02 09:49:26.310 57.416267 2021-11-02 09:49:26.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
... ... ... ... ... ... ... ...
1048570 2,03804E+18 2021-11-02 09:49:06.610 58.866556 2021-11-02 09:49:06.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
1048571 2,03804E+18 2021-11-02 09:49:07.610 57.896402 2021-11-02 09:49:07.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
1048572 2,03804E+18 2021-11-02 09:49:08.610 58.556143 2021-11-02 09:49:08.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
1048573 2,03804E+18 2021-11-02 09:49:09.610 58.915390 2021-11-02 09:49:09.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
1048574 2,03804E+18 2021-11-02 09:49:10.610 59.383412 2021-11-02 09:49:10.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162

1048575 rows × 7 columns

Adding dataProvider column which is "Sorama".

In [ ]:
# Add dataProvider column
df_sorama_SMD['dataProvider'] = 'Sorama'
df_sorama_SMD
Out[ ]:
refDevice dateObservedFrom LAeq dateObservedTo type dateCreated dateModified dataProvider
0 2,03804E+18 2021-11-02 09:49:22.310 57.077148 2021-11-02 09:49:22.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama
1 2,03804E+18 2021-11-02 09:49:23.310 57.639920 2021-11-02 09:49:23.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama
2 2,03804E+18 2021-11-02 09:49:24.310 58.501630 2021-11-02 09:49:24.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama
3 2,03804E+18 2021-11-02 09:49:25.310 57.684305 2021-11-02 09:49:25.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama
4 2,03804E+18 2021-11-02 09:49:26.310 57.416267 2021-11-02 09:49:26.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama
... ... ... ... ... ... ... ... ...
1048570 2,03804E+18 2021-11-02 09:49:06.610 58.866556 2021-11-02 09:49:06.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama
1048571 2,03804E+18 2021-11-02 09:49:07.610 57.896402 2021-11-02 09:49:07.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama
1048572 2,03804E+18 2021-11-02 09:49:08.610 58.556143 2021-11-02 09:49:08.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama
1048573 2,03804E+18 2021-11-02 09:49:09.610 58.915390 2021-11-02 09:49:09.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama
1048574 2,03804E+18 2021-11-02 09:49:10.610 59.383412 2021-11-02 09:49:10.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama

1048575 rows × 8 columns

Addiing an id column as a primary key column, this can be changed to be more specific, in this case it's numbered from 1 till the end.

In [ ]:
# Add a unique id column (starting from 1)
df_sorama_SMD['id'] = range(1, len(df_sorama_SMD) + 1)
df_sorama_SMD
Out[ ]:
refDevice dateObservedFrom LAeq dateObservedTo type dateCreated dateModified dataProvider id
0 2,03804E+18 2021-11-02 09:49:22.310 57.077148 2021-11-02 09:49:22.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 1
1 2,03804E+18 2021-11-02 09:49:23.310 57.639920 2021-11-02 09:49:23.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 2
2 2,03804E+18 2021-11-02 09:49:24.310 58.501630 2021-11-02 09:49:24.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 3
3 2,03804E+18 2021-11-02 09:49:25.310 57.684305 2021-11-02 09:49:25.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 4
4 2,03804E+18 2021-11-02 09:49:26.310 57.416267 2021-11-02 09:49:26.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 5
... ... ... ... ... ... ... ... ... ...
1048570 2,03804E+18 2021-11-02 09:49:06.610 58.866556 2021-11-02 09:49:06.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 1048571
1048571 2,03804E+18 2021-11-02 09:49:07.610 57.896402 2021-11-02 09:49:07.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 1048572
1048572 2,03804E+18 2021-11-02 09:49:08.610 58.556143 2021-11-02 09:49:08.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 1048573
1048573 2,03804E+18 2021-11-02 09:49:09.610 58.915390 2021-11-02 09:49:09.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 1048574
1048574 2,03804E+18 2021-11-02 09:49:10.610 59.383412 2021-11-02 09:49:10.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 1048575

1048575 rows × 9 columns

The locations are required for the Sorama sound sensors. To get the locations the image below is used with google maps to manually aquire the locations.

Image 005

Adding the location column with the coordinates from the image above and google maps. These co-ordinates is from google maps and maybe incorrect, and is currently only for one device (listener 5 or 7 or 8 ,not sure) since the DeviceID is rounded off in the CSV file. I checked to see if I could revert it or if excel was doing it but it seems to not be the case.

In [ ]:
coordinates = "5.481549426062068, 51.452869111964304"
df_sorama_SMD['location'] = [coordinates] * len(df_sorama_SMD)
df_sorama_SMD
Out[ ]:
refDevice dateObservedFrom LAeq dateObservedTo type dateCreated dateModified dataProvider id location
0 2,03804E+18 2021-11-02 09:49:22.310 57.077148 2021-11-02 09:49:22.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 1 5.481549426062068, 51.452869111964304
1 2,03804E+18 2021-11-02 09:49:23.310 57.639920 2021-11-02 09:49:23.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 2 5.481549426062068, 51.452869111964304
2 2,03804E+18 2021-11-02 09:49:24.310 58.501630 2021-11-02 09:49:24.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 3 5.481549426062068, 51.452869111964304
3 2,03804E+18 2021-11-02 09:49:25.310 57.684305 2021-11-02 09:49:25.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 4 5.481549426062068, 51.452869111964304
4 2,03804E+18 2021-11-02 09:49:26.310 57.416267 2021-11-02 09:49:26.310 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 5 5.481549426062068, 51.452869111964304
... ... ... ... ... ... ... ... ... ... ...
1048570 2,03804E+18 2021-11-02 09:49:06.610 58.866556 2021-11-02 09:49:06.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 1048571 5.481549426062068, 51.452869111964304
1048571 2,03804E+18 2021-11-02 09:49:07.610 57.896402 2021-11-02 09:49:07.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 1048572 5.481549426062068, 51.452869111964304
1048572 2,03804E+18 2021-11-02 09:49:08.610 58.556143 2021-11-02 09:49:08.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 1048573 5.481549426062068, 51.452869111964304
1048573 2,03804E+18 2021-11-02 09:49:09.610 58.915390 2021-11-02 09:49:09.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 1048574 5.481549426062068, 51.452869111964304
1048574 2,03804E+18 2021-11-02 09:49:10.610 59.383412 2021-11-02 09:49:10.610 NoiseLevelObserved 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162 Sorama 1048575 5.481549426062068, 51.452869111964304

1048575 rows × 10 columns

In [ ]:
df_sorama_SMD.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1048575 entries, 0 to 1048574
Data columns (total 10 columns):
 #   Column            Non-Null Count    Dtype         
---  ------            --------------    -----         
 0   refDevice         1048575 non-null  object        
 1   dateObservedFrom  1048575 non-null  object        
 2   LAeq              1048575 non-null  float64       
 3   dateObservedTo    1048575 non-null  object        
 4   type              1048575 non-null  object        
 5   dateCreated       1048575 non-null  datetime64[ns]
 6   dateModified      1048575 non-null  datetime64[ns]
 7   dataProvider      1048575 non-null  object        
 8   id                1048575 non-null  int64         
 9   location          1048575 non-null  object        
dtypes: datetime64[ns](2), float64(1), int64(1), object(6)
memory usage: 80.0+ MB

Converting columns types to match Smart Data Model types.

In [ ]:
# Convert dateObservedFrom and dateObservedTo to datetime
df_sorama_SMD['dateObservedFrom'] = pd.to_datetime(df_sorama_SMD['dateObservedFrom'], errors='coerce')
df_sorama_SMD['dateObservedTo'] = pd.to_datetime(df_sorama_SMD['dateObservedTo'], errors='coerce')

Convert type to string.

In [ ]:
df_sorama_SMD['type'] = df_sorama_SMD['type'].astype(str)

Convert refDevice to string.

In [ ]:
df_sorama_SMD['refDevice'] = df_sorama_SMD['refDevice'].astype(str)

Convert LAeq to numeric (float).

In [ ]:
df_sorama_SMD['LAeq'] = pd.to_numeric(df_sorama_SMD['LAeq'], errors='coerce')

Convert dataProvider to string.

In [ ]:
df_sorama_SMD['dataProvider'] = df_sorama_SMD['dataProvider'].astype(str)

Re-ordering columns to make the data more readable.

In [ ]:
df_sorama_SMD = df_sorama_SMD[['id','dataProvider','type','refDevice', 'dateObservedFrom', 'dateObservedTo', 'LAeq','location','dateCreated','dateModified']]
df_sorama_SMD
Out[ ]:
id dataProvider type refDevice dateObservedFrom dateObservedTo LAeq location dateCreated dateModified
0 1 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:22.310 2021-11-02 09:49:22.310 57.077148 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
1 2 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:23.310 2021-11-02 09:49:23.310 57.639920 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
2 3 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:24.310 2021-11-02 09:49:24.310 58.501630 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
3 4 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:25.310 2021-11-02 09:49:25.310 57.684305 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
4 5 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:26.310 2021-11-02 09:49:26.310 57.416267 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
... ... ... ... ... ... ... ... ... ... ...
1048570 1048571 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:06.610 2021-11-02 09:49:06.610 58.866556 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
1048571 1048572 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:07.610 2021-11-02 09:49:07.610 57.896402 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
1048572 1048573 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:08.610 2021-11-02 09:49:08.610 58.556143 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
1048573 1048574 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:09.610 2021-11-02 09:49:09.610 58.915390 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
1048574 1048575 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:10.610 2021-11-02 09:49:10.610 59.383412 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162

1048575 rows × 10 columns

In [ ]:
df_sorama_SMD
Out[ ]:
id dataProvider type refDevice dateObservedFrom dateObservedTo LAeq location dateCreated dateModified
0 1 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:22.310 2021-11-02 09:49:22.310 57.077148 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
1 2 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:23.310 2021-11-02 09:49:23.310 57.639920 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
2 3 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:24.310 2021-11-02 09:49:24.310 58.501630 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
3 4 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:25.310 2021-11-02 09:49:25.310 57.684305 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
4 5 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:26.310 2021-11-02 09:49:26.310 57.416267 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
... ... ... ... ... ... ... ... ... ... ...
1048570 1048571 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:06.610 2021-11-02 09:49:06.610 58.866556 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
1048571 1048572 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:07.610 2021-11-02 09:49:07.610 57.896402 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
1048572 1048573 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:08.610 2021-11-02 09:49:08.610 58.556143 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
1048573 1048574 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:09.610 2021-11-02 09:49:09.610 58.915390 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162
1048574 1048575 Sorama NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:10.610 2021-11-02 09:49:10.610 59.383412 5.481549426062068, 51.452869111964304 2024-07-11 12:46:03.369713 2024-07-11 12:46:03.371162

1048575 rows × 10 columns

The table above shows the end result of the conversion for the Sorama data. With the columns types listed below

In [ ]:
df_sorama_SMD.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1048575 entries, 0 to 1048574
Data columns (total 10 columns):
 #   Column            Non-Null Count    Dtype         
---  ------            --------------    -----         
 0   id                1048575 non-null  int64         
 1   dataProvider      1048575 non-null  object        
 2   type              1048575 non-null  object        
 3   refDevice         1048575 non-null  object        
 4   dateObservedFrom  1048575 non-null  datetime64[ns]
 5   dateObservedTo    1048575 non-null  datetime64[ns]
 6   LAeq              1048575 non-null  float64       
 7   location          1048575 non-null  object        
 8   dateCreated       1048575 non-null  datetime64[ns]
 9   dateModified      1048575 non-null  datetime64[ns]
dtypes: datetime64[ns](4), float64(1), int64(1), object(4)
memory usage: 80.0+ MB

4) Generating Scripts (Load)¶

Genertating Scripts to make all the steps made in the conversion in one single script. This script can be used in a live connection in the data pipeline for uploading the data to a Smart Data Model. Right now the input is the CSV files and output also CSV files and JSON files.

4.1) Vinotion Python Script¶

In [ ]:
import pandas as pd
import numpy as np

df_vinotion_full = pd.read_csv (r'/Users/maxwellernst/Documents/data/df_vinotion_with_seconds.csv', sep=',')

# Select specific columns
df_vinotion_SMD = df_vinotion_full[['datetime_short','cameraId', 'classification', 'speed', 'uuid']].copy() 
# Convert 'dateObservedFrom' and 'dateObservedTo' columns to datetime type
df_vinotion_SMD['datetime_short'] = pd.to_datetime(df_vinotion_SMD['datetime_short'], format='%Y-%m-%d:%H:%M:%S')
# Set datetime_short as the index
df_vinotion_SMD.set_index('datetime_short', inplace=True)

# Resample to one-minute intervals and aggregate by cameraId
df_vinotion_SMD = df_vinotion_SMD.groupby('cameraId').resample('1T').agg({
    'classification': lambda x: list(x),  # Collect all classifications into a list
    'speed': 'mean',  # Calculate the average speed
    'uuid': lambda x: list(x)  # Collect all UUIDs into a list
}).copy()

# Reset index to make datetime_short a column again
df_vinotion_SMD.reset_index(inplace=True)

# Add a unique id column (starting from 1)
df_vinotion_SMD['id'] = range(1, len(df_vinotion_SMD) + 1)

# Rename datetime_short column to dateObservedFrom
df_vinotion_SMD.rename(columns={'datetime_short': 'dateObservedFrom'}, inplace=True)

# Create dateObservedTo by shifting dateObservedFrom by one minute and grouping by cameraID 
df_vinotion_SMD['dateObservedTo'] = df_vinotion_SMD.groupby('cameraId')['dateObservedFrom'].shift(-1)

#TrafficFlowObserved
df_vinotion_SMD['type'] = 'TrafficFlowObserved'

#renaming columns
df_vinotion_SMD = df_vinotion_SMD.rename(columns={'cameraId': 'source', 'classification':'vehicleType', 'speed':'averageVehicleSpeed','uuid':'name'})


# Add dateCreated and dateModified columns
df_vinotion_SMD['dateCreated'] = pd.Timestamp.now()
df_vinotion_SMD['dateModified'] = pd.Timestamp.now()

# Calculate intensity: Total number of vehicles detected during this observation period
df_vinotion_SMD['intensity'] = df_vinotion_SMD['vehicleType'].apply(len)

# Add dataProvider column
df_vinotion_SMD['dataProvider'] = 'Vinotion'

# Calculate averageHeadwayTime: Average headway time
# The average headway time represents the average time in seconds between each vehicle observed, 
# indicating how close or far each vehicle is from one another.
df_vinotion_SMD['averageHeadwayTime'] = df_vinotion_SMD.groupby('source')['dateObservedFrom'].diff().dt.total_seconds().shift(-1) / (df_vinotion_SMD['intensity'] -1)

# Replace infinite values and negative values with NaN.
df_vinotion_SMD['averageHeadwayTime'].replace([np.inf, -np.inf], np.nan, inplace=True)
df_vinotion_SMD.loc[df_vinotion_SMD['averageHeadwayTime'] < 0, 'averageHeadwayTime'] = np.nan

# Calculate average speed of the last hour
df_vinotion_SMD['avg_speed_last_hour'] = df_vinotion_SMD.groupby('source')['averageVehicleSpeed'].rolling(window=60).mean().reset_index(level=0, drop=True)

df_vinotion_SMD.set_index('dateObservedFrom', inplace=True)
df_vinotion_SMD['avg_speed_last_hour'] = df_vinotion_SMD.groupby('source')['averageVehicleSpeed'].rolling(window='3600s').mean().reset_index(level=0, drop=True)
df_vinotion_SMD.reset_index(inplace=True)

# Calculate speed decrease compared to the average speed of the last hour
df_vinotion_SMD['speed_decrease'] = (df_vinotion_SMD['averageVehicleSpeed'] - df_vinotion_SMD['avg_speed_last_hour']) / df_vinotion_SMD['avg_speed_last_hour']

# Identify congested periods based on the defined threshold
congested_threshold = 0.2  # 20% speed decrease
congested_duration_threshold = 5  # 5 minutes
df_vinotion_SMD['congested'] = (df_vinotion_SMD['speed_decrease'] <= -congested_threshold).rolling(window=congested_duration_threshold).sum() >= congested_duration_threshold


# Filter the DataFrame to display only rows where 'congested' is True
congested_true_df = df_vinotion_SMD[df_vinotion_SMD['congested'] == True]

# Drop the columns 'avg_speed_last_hour' and 'speed_decrease' since they are not needed in the SMD model
df_vinotion_SMD = df_vinotion_SMD.drop(columns=['avg_speed_last_hour', 'speed_decrease'])

# Convert the 'vehicleType' column to string
df_vinotion_SMD['vehicleType'] = df_vinotion_SMD['vehicleType'].astype(str)

# Convert the 'name' column to string
df_vinotion_SMD['name'] = df_vinotion_SMD['name'].astype(str)

# Replace empty lists of 'vehicleType' with NaN
df_vinotion_SMD['vehicleType'] = df_vinotion_SMD['vehicleType'].replace('[]', np.nan)

# Replace empty string of 'vehicleType' with NaN
df_vinotion_SMD['name'] = df_vinotion_SMD['name'].replace('[]', np.nan)

#TrafficFlowObserved SMD specified
df_vinotion_SMD['type'] = 'TrafficFlowObserved'

#re-ordering columns
df_vinotion_SMD = df_vinotion_SMD[['id','dataProvider','type','name','source','dateObservedFrom','dateObservedTo','vehicleType',
                                   'averageVehicleSpeed','intensity','averageHeadwayTime','congested','dateCreated','dateModified']]

# Save the modified DataFrame to a new CSV file (optional)
df_vinotion_SMD.to_csv('/Users/maxwellernst/Documents/data/modified_vinotion_SMD.csv', index=False)

# Export DataFrame to JSON file
df_vinotion_SMD.to_json('/Users/maxwellernst/Documents/data/modified_vinotion_SMD.json', orient='records')


# Display the DataFrame
df_vinotion_SMD
Out[ ]:
id dataProvider type name source dateObservedFrom dateObservedTo vehicleType averageVehicleSpeed intensity averageHeadwayTime congested dateCreated dateModified
0 1 Vinotion TrafficFlowObserved ['030e5136-bcb1-4162-b686-41cc63667155', '2880... wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:51:00 2021-11-02 09:52:00 ['Car', 'Car', 'Car', 'Truck', 'Car', 'Car', '... 45.112545 8 8.571429 False 2024-07-11 12:46:33.196966 2024-07-11 12:46:33.197610
1 2 Vinotion TrafficFlowObserved ['05fcbbf5-29e8-440e-a243-409740d64788', 'b0b1... wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:52:00 2021-11-02 09:53:00 ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... 38.883007 24 2.608696 False 2024-07-11 12:46:33.196966 2024-07-11 12:46:33.197610
2 3 Vinotion TrafficFlowObserved ['de667d19-4473-474f-ad89-f674d8a73e88', 'b28d... wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:53:00 2021-11-02 09:54:00 ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... 39.584312 28 2.222222 False 2024-07-11 12:46:33.196966 2024-07-11 12:46:33.197610
3 4 Vinotion TrafficFlowObserved ['33f2ad13-292c-47df-8cbe-3e948f11d534', 'a855... wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:54:00 2021-11-02 09:55:00 ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... 38.379448 31 2.000000 False 2024-07-11 12:46:33.196966 2024-07-11 12:46:33.197610
4 5 Vinotion TrafficFlowObserved ['6dbbbee2-dc5d-4818-a30a-c68203ebb944', 'b7e7... wss://10.252.229.11/api/v2/data/main/object 2021-11-02 09:55:00 2021-11-02 09:56:00 ['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'Ca... 40.971777 23 2.727273 False 2024-07-11 12:46:33.196966 2024-07-11 12:46:33.197610
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
205540 205541 Vinotion TrafficFlowObserved ['9c7f1eed-9c40-46f1-95f0-27f79706dfeb', 'd7f2... wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:55:00 2021-11-30 22:56:00 ['Car', 'Car'] 56.446200 2 60.000000 False 2024-07-11 12:46:33.196966 2024-07-11 12:46:33.197610
205541 205542 Vinotion TrafficFlowObserved ['f6ab0d5f-2d85-4a19-bf20-baec498210d6', '67a4... wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:56:00 2021-11-30 22:57:00 ['Car', 'Car', 'Car'] 58.041480 3 30.000000 False 2024-07-11 12:46:33.196966 2024-07-11 12:46:33.197610
205542 205543 Vinotion TrafficFlowObserved NaN wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:57:00 2021-11-30 22:58:00 NaN NaN 0 NaN False 2024-07-11 12:46:33.196966 2024-07-11 12:46:33.197610
205543 205544 Vinotion TrafficFlowObserved ['b7a7f85f-06f0-4ef1-b4c6-34d2b9d7a934'] wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:58:00 2021-11-30 22:59:00 ['Car'] 54.285480 1 NaN False 2024-07-11 12:46:33.196966 2024-07-11 12:46:33.197610
205544 205545 Vinotion TrafficFlowObserved ['522263fa-6df0-4280-afa6-365ffc0a552b'] wss://10.252.229.15/api/v2/data/main/object 2021-11-30 22:59:00 NaT ['Car'] 54.156960 1 NaN False 2024-07-11 12:46:33.196966 2024-07-11 12:46:33.197610

205545 rows × 14 columns

4.2) Sorama Python Script¶

In [ ]:
import pandas as pd
import numpy as np

# Load the CSV file
df_sorama_full = pd.read_csv('/Users/maxwellernst/Documents/data/modified_data.csv', sep=',')


# Select specific columns and create new DataFrame
df_sorama_SMD = df_sorama_full[['deviceId', 'Content_time', 'Content_value']].copy()

# Create necessary columns with specified values
df_sorama_SMD['dateObservedFrom'] = df_sorama_SMD['Content_time'].copy()
df_sorama_SMD['dateObservedTo'] = df_sorama_SMD['Content_time'].copy()
df_sorama_SMD['type'] = 'NoiseLevelObserved'
df_sorama_SMD['dateCreated'] = pd.Timestamp.now()
df_sorama_SMD['dateModified'] = pd.Timestamp.now()
df_sorama_SMD['dataProvider'] = 'Vinotion'
df_sorama_SMD['id'] = range(1, len(df_sorama_SMD) + 1)
coordinates = "5.481549426062068, 51.452869111964304"
df_sorama_SMD['location'] = [coordinates] * len(df_sorama_SMD)

# Rename columns to match the desired format
df_sorama_SMD.rename(columns={'deviceId': 'refDevice', 'Content_value': 'LAeq'}, inplace=True)

# Convert dateObservedFrom and dateObservedTo to datetime
df_sorama_SMD['dateObservedFrom'] = pd.to_datetime(df_sorama_SMD['dateObservedFrom'], errors='coerce')
df_sorama_SMD['dateObservedTo'] = pd.to_datetime(df_sorama_SMD['dateObservedTo'], errors='coerce')

# Convert LAeq to numeric (float)
df_sorama_SMD['LAeq'] = pd.to_numeric(df_sorama_SMD['LAeq'], errors='coerce')

# Convert dataProvider to string
df_sorama_SMD['dataProvider'] = df_sorama_SMD['dataProvider'].astype(str)

#re-ordering columns
df_sorama_SMD = df_sorama_SMD[['id','dataProvider','type','refDevice', 'dateObservedFrom', 'dateObservedTo', 'LAeq','location','dateCreated','dateModified']]


# Save the modified DataFrame to a new CSV file (optional)
df_sorama_SMD.to_csv('/Users/maxwellernst/Documents/data/modified_sorama_SMD.csv', index=False)

# Export DataFrame to JSON file
df_vinotion_SMD.to_json('/Users/maxwellernst/Documents/data/modified_sorama_SMD.json', orient='records')

# Display the DataFrame
df_sorama_SMD
Out[ ]:
id dataProvider type refDevice dateObservedFrom dateObservedTo LAeq location dateCreated dateModified
0 1 Vinotion NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:22.310 2021-11-02 09:49:22.310 57.077148 5.481549426062068, 51.452869111964304 2024-07-11 12:46:46.267180 2024-07-11 12:46:46.268417
1 2 Vinotion NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:23.310 2021-11-02 09:49:23.310 57.639920 5.481549426062068, 51.452869111964304 2024-07-11 12:46:46.267180 2024-07-11 12:46:46.268417
2 3 Vinotion NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:24.310 2021-11-02 09:49:24.310 58.501630 5.481549426062068, 51.452869111964304 2024-07-11 12:46:46.267180 2024-07-11 12:46:46.268417
3 4 Vinotion NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:25.310 2021-11-02 09:49:25.310 57.684305 5.481549426062068, 51.452869111964304 2024-07-11 12:46:46.267180 2024-07-11 12:46:46.268417
4 5 Vinotion NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:26.310 2021-11-02 09:49:26.310 57.416267 5.481549426062068, 51.452869111964304 2024-07-11 12:46:46.267180 2024-07-11 12:46:46.268417
... ... ... ... ... ... ... ... ... ... ...
1048570 1048571 Vinotion NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:06.610 2021-11-02 09:49:06.610 58.866556 5.481549426062068, 51.452869111964304 2024-07-11 12:46:46.267180 2024-07-11 12:46:46.268417
1048571 1048572 Vinotion NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:07.610 2021-11-02 09:49:07.610 57.896402 5.481549426062068, 51.452869111964304 2024-07-11 12:46:46.267180 2024-07-11 12:46:46.268417
1048572 1048573 Vinotion NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:08.610 2021-11-02 09:49:08.610 58.556143 5.481549426062068, 51.452869111964304 2024-07-11 12:46:46.267180 2024-07-11 12:46:46.268417
1048573 1048574 Vinotion NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:09.610 2021-11-02 09:49:09.610 58.915390 5.481549426062068, 51.452869111964304 2024-07-11 12:46:46.267180 2024-07-11 12:46:46.268417
1048574 1048575 Vinotion NoiseLevelObserved 2,03804E+18 2021-11-02 09:49:10.610 2021-11-02 09:49:10.610 59.383412 5.481549426062068, 51.452869111964304 2024-07-11 12:46:46.267180 2024-07-11 12:46:46.268417

1048575 rows × 10 columns

5) JSON Schema, JSON Input and Testing¶

From the scripts the a JSON schema is created, and an input example is used of the first row to test if the the input data will match the JSON schema

5.1) Vinotion JSON Schema¶

In [ ]:
import json
from jsonschema import validate, ValidationError

# Define the JSON schema
schema1 = {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "Vinotion SMD Data",
    "type": "object",
    "properties": {
        "id": { "type": "integer" },
        "dataProvider": { "type": "string" },
        "type": { "type": "string" },
        "name": {
            "type": ["array", "null"],
            "items": { "type": "string" }
        },
        "source": { "type": "string" },
        "dateObservedFrom": { "type": "string", "format": "date-time" },
        "dateObservedTo": { "type": "string", "format": "date-time" },
        "vehicleType": {
            "type": ["array", "null"],
            "items": { "type": "string" }
        },
        "averageVehicleSpeed": {
            "type": ["number", "null"]
        },
        "intensity": { "type": "integer" },
        "averageHeadwayTime": {
            "type": ["number", "null"]
        },
        "congested": { "type": "boolean" },
        "dateCreated": { "type": "string", "format": "date-time" },
        "dateModified": { "type": "string", "format": "date-time" }
    },
    "required": [
        "id",
        "type",
        "dateObservedFrom",
        "dateObservedTo",
    ]
}

5.2) Sorama JSON Schema¶

In [ ]:
# Define the JSON schema
schema2 = {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "Sorama SMD Data",
    "type": "object",
    "properties": {
        "id": {
            "type": "integer"
        },
        "dataProvider": {
            "type": "string"
        },
        "type": {
            "type": "string"
        },
        "refDevice": {
            "type": "string"
        },
        "dateObservedFrom": {
            "type": "string",
            "format": "date-time"
        },
        "dateObservedTo": {
            "type": "string",
            "format": "date-time"
        },
        "LAeq": {
            "type": "number"
        },
        "dateCreated": {
            "type": "string",
            "format": "date-time"
        },
        "dateModified": {
            "type": "string",
            "format": "date-time"
        },
        "location": {
            "type": "string",
            "format": "point"
        }
    },
    "required": ["id",
                 "type", 
                 "dateObservedFrom", 
                 "dateObservedTo", 
                 "location"]
}

6) Testing JSON Schema and inputs¶

6.1) Vinotion JSON Test¶

In [ ]:
# Sample input to validate
input_data = {
    "id": 1,
    "dataProvider": "Vinotion",
    "type": "TrafficFlowObserved",
    "name": ["030e5136-bcb1-4162-b686-41cc63667155", "2880..."],
    "source": "wss://10.252.229.11/api/v2/data/main/object",
    "dateObservedFrom": "2021-11-02T09:51:00",
    "dateObservedTo": "2021-11-02T09:52:00",
    "vehicleType": ["Car", "Car", "Car", "Truck", "Car", "Car", "..."],
    "averageVehicleSpeed": 45.112545,
    "intensity": 8,
    "averageHeadwayTime": 8.571429,
    "congested": False,
    "dateCreated": "2024-07-02T11:36:32.281404",
    "dateModified": "2024-07-02T11:36:32.282731"
}
In [ ]:
# Validate the input against the schema
try:
    validate(instance=input_data, schema=schema1)
    print("Input is valid.")
except ValidationError as err:
    print("Input is invalid:", err.message)
Input is valid.

The Vinotion input matches the schema.

6.2) Sorama JSON Test¶

The input is used from the first row of the dataset, to test if the input can match the schema.

In [ ]:
# Define the input JSON
input_json ={
    "id": 1,
    "dataProvider": "Vinotion",
    "type": "NoiseLevelObserved",
    "refDevice": "2,03804E+18",
    "dateObservedFrom": "2021-11-02T09:49:22.310Z",
    "dateObservedTo": "2021-11-02T09:49:22.310Z",
    "LAeq": 57.077148,
    "dateCreated": "2024-05-15T10:49:29.228556Z",
    "dateModified": "2024-05-15T10:49:29.228556Z",
    "location": "5.481549426062068, 51.452869111964304"
}
In [ ]:
# Validate the input JSON against the schema
try:
    validate(instance=input_json, schema=schema2)
    print("Input JSON is valid.")
except ValidationError as e:
    print(f"Input JSON is invalid: {e.message}")
Input JSON is valid.

The Sorama JSON input data matches the schema as shown in the result above.

7) Conclusion¶

This notebook concludes the MIM2 data models and sharing for Eindhoven's Inzicht Verlicht Data Model. Although there are improvements to be made but because of time constraints they were noted down for future iterations below in the Recommendations. This ETL process shows a tangible solution of MIM2 and this proved to be useful in creating the Tutorial on how to create MIM2 models. There were many steps involved in converting the data, which seemed to be simple but took a good understanding of the data and communication with the stakeholders in order to create a reliable result.

Comments were added to ensure that the decisions and assumptions made were noted for future iterations. Ideally the scripts made would be used with a live connection to the devices and sensors, and the data would be uploaded to a Data Space through an API.

8) Recommendations¶

Must-haves :

  • Locations of the Sorama sound sensors, the locations are currently aquired manually through google maps and the image of the intersection, this could be done better with the actual locations when thinking of a live connection. Also there is currently only two sensors available, and the ID row values are rounded off based on the orginal CSV values, which could be an issue later on.

  • refRoadSegment from Smart Data Model : TrafficFlowObserved should be added, which is the RuleID + RealDirection in the form of a URL type.

Nice-to-haves :

  • LAmax and LAeq_d columns can be added to the Sorama CSV if the data is aggregated is larger intervals such as 1 minute instead of 1 second since it is already aggregated to 1 second intervals, the LAqMax would be the same as the LAq values.

  • Look into the NoisePollution Smart Data Model for Sorama data conversion.

  • Can still convert the Air Quality data to the Smart Data Model of AirQualityObserved.

  • Look into NDW public road data for Eindhoven to maybe enrich the data more. Eindhoven prefer the NDW public data over the BGT national open data.

  • Look into data from OpenStreetMap.

Example of refDevice or Device ID's that are in the current data, which are rounded off, and are only two.

In [ ]:
unique_values = df_sorama_SMD['refDevice'].unique()
print(unique_values)
['2,03804E+18' '1,46158E+18']